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Abstract — Given the single-letter capacity formula and the 
converse proof of a channel without input constraints, we provide 
a simple approach to extend the results for the same channel 
but with input constraints. The resulting capacity formula is 
the minimum of a Lagrange dual function. It gives an unified 
formula in the sense that it works regardless whether the problem 
is convex. If the problem is non-convex, we show that the capacity 
can be larger than the formula obtained by the naive approach of 
imposing constraints on the maximization in the capacity formula 
of the case without the constraints. 

The extension on the converse proof is simply by adding a 
term involving the Lagrange multiplier and the constraints. The 
rest of the proof does not need to be changed. We name the proof 
method the Lagrangian Converse Proof. In contrast, traditional 
approaches need to construct a better input distribution for 
convex problems or need to introduce a time sharing variable 
for non-convex problems. We illustrate the Lagrangian Converse 
Proof for three channels, the classic discrete time memoryless 
channel, the channel with non-causal channel-state information at 
the transmitter, the channel with limited channel-state feedback. 
The extension to the rate distortion theory is also provided. 

Index Terms — Converse, Coding Theorem, Capacity, Rate 
Distortion, Duality, Lagrange Dual Function 

I. Introduction 

Naively imposing input constraints on the maximization 
in the single-letter capacity formula of a channel without 
input constraints often produces the capacity formula of the 
same channel with the constraints. For example, the classic 
discrete time memoryless channel without input constraints 
has capacity 

C' = maxI{X;Y), 

Px 

and with a power constraint, the capacity is 
C = max I{X\Y). 

Such cases are so prevalent that one may suspect it is always 
the case. We started with this belief while working on channels 
with limited channel-state feedback. If one denotes the single 
letter capacity for the case without constraint as 

C 



and the capacity can be expressed as 

C 



max Mutual Information, 



(1) 



contrary to the conventional belief, we found in [1], [2] that 
the capacity for the case with the constraint can be larger than 



R = 



max Mutual Information 
constraint 
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= mill Lagrange Dual Functionf A) (2) 

A>0 

= i? + Duality Gap 

> R, (3) 



where the Lagrange dual function [3] to the primary problem 
R counts for the constraint. 

Capacity formula (|2|i reduces to the maximum of the mu- 
tual information when the duality gap is zero and therefore, 
Equation (|2]) is an unified expression for cases with non-zero 
or zero duality gaps. 

During the discovery of the capacity result for the channel 
with limited feedback and with constraints, we found a new 
proof of the converse part of the capacity theorem. It is 
obtained via modifying the converse proof for the case without 
the constraints by adding to the second to the last expression 
a term involving the Lagrange multiplier and the constraints. 
The rest of the proof is unchanged. We call such a proof 
the Lagrangian Converse Proof. With little modification, the 
method can also be used to prove the converse part of the 
rate distortion theorem. The unexpected simplicity and the 
potential to obtain new results with ease motivates us to report 
it here. 

A meaningful theory should be able to explain the past and 
predict the future. In this paper, we show that the Lagrangian 
Converse Proof can simplify the existing proof of the capacity 
of the classic discrete memoryless channels and the proof 
of the capacity of the channels with non-causal channel-state 
information at the transmitters (CSIT) [4]-[6]. In addition, we 
illustrate how to use it to obtain new capacity results of the 
channels with limited channel-state feedback [1], [2]. 

To understand why the capacity can be greater than the max- 
imum of the mutual information as shown in (O, we provides 
a convex hull explanation of the capacity region of the single 
user channel. Yes, even for single user channels, investigating 
the capacity region is meaningful when the capacity needs to 
be achieved using time sharing. The minimum of the Lagrange 
dual function conveniently characterize the capacity region's 
boundary points without explicitly employing the time sharing. 
The intuition is that when the duality gap is greater than zero, 
multiple solutions to (|2]i exist. Some solution is below the 
constraint and some is above the constraint. A time sharing 
of the solutions will achieve the capacity and at the same 
time, satisfy the constraint exactly. Therefore, the capacity can 
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alternatively be expressed as the maximum of the time sharing 
of the mutual information. 

In summary, the contributions of the paper are as follows. 

• A simple converse proof is provided for the capacities of 
channels with constraints and for rate distortion theorems; 

• Expressed using the Lagrange dual function, an unified 
capacity formula is presented and shown to have an 
intimate relation to the convex hull of the capacity region 
and the time sharing. Free of time sharing variables, the 
expression also makes the calculation of the capacities 
easier. The capacity formula also has a pleasant symmet- 
ric relation to rate distortion function. 

In Section Ull the simplicity of the Lagrangian converse 
proof is illustrated for three channels, the discrete memoryless 
channel, the channel with non-causal channel-state informa- 
tion, and the channel with limited channel-state feedback. For 
the latter, the relation among the capacity formula, the capacity 
region, and the time sharing is explained. In Section |III] the 
converse proof is extended to the rate distortion theory. The 
dual relation of channel capacity and rate distortion is briefly 
discussed. Section |IV] summarizes the usage of the proposed 
converse proof. 

II. The Lagrangian Converse Proof for Channel 
Capacities 

There are two traditional methods of converse proof for 
channels with input constraints. The first method takes ad- 
vantage of the convexity of the problem and produces a 
better input distribution from any input distribution induced 
by the information message and the code. This better input 
distribution must also satisfy the input constraints. Section 
III-AI compares this method with the Lagrangian Converse 
Proof for the classic discrete memoryless channels. The second 
method is to introduce a time sharing variable for non-convex 
problems. Section III-BI and III-CI compares it with the new 
converse proof for channels with non-causal channel-state 
information at the transmitter and for channels with limited 
feedback, of which an example of nonzero duality gap is 
provided. 

A. The Capacity of the Discrete Memoryless Channels 

The channel {X,pY\x,y) in Figure [T] is a memoryless 
channel with finite alphabets {X,y) for input X ^ X and 
output Y E y. The inputs over N channel satisfy the 
constraint, 

1 ^ 

-5]E[a(X„)] <po, (4) 

71=1 

where the expectation is over the information message and 
a(-) : A" ^ M is a real valued function. For example, it is a 
power constraint if a{X) — X^. 



It is well known that the capacity of this channel without 
the constraint is 

C[ = max/(X;r), (5) 

px 

and with the constraint, the capacity is 

Ri = max I{X;Y). (6) 

Px : E[a{X)] < po 

The Lagrange dual function of (|6]l is 

Li{X,po) = max/(X;r)-A(E[a(X)] -po), (7) 

Px 

which is an upper bound to Ri for all A > and all px that 
satisfy the constraint E[q!(X)] < po [3]. The duality gap Gi 
is defined as the least upper bound minus Ri, i.e., 

Gi = mf Li(A,po) - ^1- 

Because the mutual information is a convex n function of the 
input distribution px and the input constraint is convex, Ri 
is a convex n function of p^. Therefore, the duality gap Gi 
is zero [7] and the capacity can be expressed as 

Gi = minii(A,po) (8) 

A>0 

= LliX* , po) = Ri. (9) 

We compare the converse proof with and without the 
constraint. The last step of the converse proof for the case 
without the constraint is 

N 

^/(X„;r„) < NCi 

n=l 

where Ci dominates /(X„;y„) for every n. With input 
constraint the additional steps of the traditional proof of the 
converse [8, Chapter 7.3] are 

N 

Y,HXn;Y„) < NI{X-Y) (10) 

< NRi, (11) 

where, unlike the case without input constraint, Ri may 
not dominate every /(X„;y„) because the constraint (HJl is 
averaged over N channel uses and thus it is possible that 
E[q!(X„)] > Po for some n. One has to construct a new input 
distribution Px{x) = jjJ2n=iPx„{x) ^^'^ ^^e the property 
that the mutual information is a convex n function of input 
distribution to obtain (fTOl i. Luckily, the new input distribution 
satisfies the constraint E[q:(X)] < pa because E[a(X)] is a 
convex function of px, and thus one obtains (fTTl i. 

Using the Lagrangian Converse Proof, the key step is to add 
a term of Lagrange multiplier: 

N 

Y,I{X,-Y^) 

n=l 

N 

< ^(/(X„;r„)-A*(E[a(X„)]-po)) (12) 

n=l 

< NCi, (13) 
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Fig. 2. A channel with non-causal channel-state information at the transmitter 

where A* > is the solution in (|9]i; (fTSl i follows 
from the fact that X„'s satisfy the constraint and thus 
((E!LiE[a(^rO]) -Npo) > 0; (O follows from 
the fact that Ci of (|9]l dominates the summand in ( fT2] i for 
every n, as in the case without constraints, because the power 
penalty A* (E [a(X„)] — po) punishes excessive power use. 
The simplification is that we do not need to construct a better 
input distribution px- It will be significant when there is no 
obvious way to find a better px- 

B. The Capacity of Channels with Non-causal Channel-state 
Information at the Transmitter 

As shown in Figure|2] the memoryless channel with finite al- 
phabets is characterized by {X ,y ,81,82, PSi,So_,Py\x,Si,S2)^ 
where X ^ X is the channel input; (5*1, 5*2) £ {81,82) is 
the channel-state with distribution ps^^Sa' Py\x.Si,s-, is the 
channel transition probability; and {Y £ 3^, 6*2 £ 82) is 
the channel output, i.e., the channel-state S2 is non-causally 
known at the receiver. The channel-state 5*1 is non-causally 
known at the transmitter. In the proof of the converse, the 
inputs over N channel uses satisfy the constraint, 



1 ^ 



(14) 



where the expectation is over the information message and the 
state Si. 

Without input constraints, the capacity is directly obtained 
in [5] or can be obtained from [4] by considering {Y G 
y, S2 € 82) as the channel output. The capacity is 

Co ^ max I(U;S2,Y) - I(U;Si), 

U.X=v{U.Si)..puis, 

where X is a deterministic function of U and Si, U ^ U is 
an auxiliary random variable. 

With the input constraint, the capacity is 



C2 = 



niini2(A,po) 



where 



= L2{X*,po) 
= R2, 



i?2 = max 

U.X=v{U,Si)..puiSi ■.E[a{X)]<po 

I{U-S2,Y)-I{U-Si)- 



(15) 

(16) 
(17) 



(18) 



L2{\,po) = max /([/; ^2, F) - /(C/; 5i) 

U,X='^(U,Si),puiSi 



is the Lagrange dual function to the primary problem ( fTSl ); 

E[a{X)] = ^^ps^{si)pu\sA'^\si)a{'p{u,si)); 

SI u 

dTTl) follows from the fact that [/ can include a time sharing 
variable [7] in it, and thus, R2 is a convex (n) function of po, 
and therefore, the duality gap is zero. 

The traditional proof for the case with the constraint intro- 
duces a time sharing variable as follows [6]. 

I{W;Yi'',S^,i) 

N 

< ^/(t/„;r„,52,„)-/(f/„;5i,„) (20) 

71=1 

= N{I{U;Y,S2\Q)-IiU;Si\Q)) (21) 
= N{I{U,Q;Y,S2)-I{Q;Y,S2) 

-I{U,Q;Si) + I{Q;Si)) (22) 

< N{I{U,Q;Y,S2)~I{U,Q;S,)) (23) 
= N{l{U;Y,S2)-I{U;Si)) (24) 

< NR2 (25) 

where W is the information message; Un = 
(VF,r"~\S'£7\S'f'„+i); (EOll is obtained in [5]; (EB 
is obtained by the definition of conditional mutual 
information and by letting Q be uniformly distributed 
over {l,2,...,iV}, U = Uq, Si = Si,q, S2 = S2,q, 
and Y — Yq; (|22T i follows from the chain rule of 
the mutual information; ( l23T l follows from the fact that 
{S'i,i,...,S'i,Ar} are i.i.d^ and thus, I{Q;Si) = 0; dH 
follows from defining U = {U,Q); (|25T l follows from 
W^n=i^Wn)] = E[a{XQ)] = E[a(X)] < po and the 
fact that (|24| ) is a convex U function of Px\ij Si when Ptj^Si 
is fixed, which implies that the optimal X is a deterministic 
function of of U and Si. 

Using the Lagrangian Converse Proof, the same capacity 
result can be obtained without resorting to the time sharing 
variable: 



I{W;Yl\Si\) 

N 

< ^/(c/„;y„,^2, 



I{Un] Si^n) 



Tl=l 

N 



n=l 

-A* (E[a(X„)] - po)] , 
< NC2, 



(26) 
(27) 



-A(E[a(X)]-po)}, 



(19) 



where ( 1261 ) follows from the fact that X„'s satisfy the average 
power constraint. 

So far, we have seen two examples where the duality gap 
is zero. One might worry whether the proof works when the 
duality gap is not zero. In the next subsection, we show that 
it works even when the duality gap is not zero. 
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C. Capacity of Channels with Limited Feedback and Input 
Constraint 

We consider a channel with designable finite-rate/limited 
feedback. As shown in Figure [3] the memoryless channel with 
finite alphabets is characterized by {X,y,V,U,pv,PY\x.v)^ 
where X G X is the channel input, V E V is the channel- 
state with distribution pv, Py\x,v is the channel transition 
probability, and {Y G y,V € V) is the channel output, i.e., 
the channel-state is know at the receiver For the n* channel 
use, the transmitter receives a causal, but not strictly causal, 
finite-rate, and error free channel-state feedback Un E U = 
{1,...,2^"'} from the receiver The feedback C/„ could be 
designed as a deterministic or random function of current 
channel-state Vn and/or past channel-states Vi ^^. Because the 
receiver produces [/„, it is assumed known to the receiver. In 
the proof of the converse, the inputs over N channel uses 
satisfy the constraint. 



1 ^ 

-^E[a(X„)] < 



Po, 



(28) 



where the expectation is over the information message and the 
feedback. 

The capacity [9] of this channel without input constraint is 



C3 = max I{X;Y\U = ip{V),V) 

= max > p{v) 

■I {px\u{-W{v)), PY\x.y{-\;v)) , (29) 

where the important claim is that the feedback U ~ f{V) is a 
deterministic and memoryless function of the current channel- 
state V; in ( |29] | the mutual information is written as a function 
of its input distribution and its channel transition probability. 

Based on Cg, one might expect the capacity with input 
constraint to be 

i?3 = max I{X]Y\U ^ Lp{V),V) (30) 

v{-),Px\u ■ 
E[a{X)] < Po 

The surprising result is that the capacity may be larger than 

i?3. 

Theorem 1: [1], [2] The capacity of the channel 

{X,y,V,U,pv,PY\x.v) with designable finite-rate {\U\ = 



max {I{X;Y\U = ipiV),V) 



2^fb) channel-state feedback and input constraint po is 

C3 = minL3(A,po) (31) 

A>0 

= L3{X*,po) (32) 
= i?3 + duality gap 
> i?3 

where 

V(-),Px\u 

-X{E[a{X)]-po)} (33) 

is the Lagrange dual function to the primary problem ( l30b . 

1) Without the Input Constraint: We first review the key 
steps of the converse proof without the input constraint [2]. 
The mutual information between the information message and 
the received signal is bounded as 

N 

< EE^^r^) 

■h{ff'''\u\v)jf'"\x\u)) (34) 

N 

< EE^(«r) 



n=l „i-i 



•/s {p*u\viM^)^P*x\ui^\u)) 



NC- 



31 



(35) 
(36) 



where ( l34b is obtained in [1], [2] and 

f3{fliu\v)j2{x\u)) 

^ Ep^(")E'^i("|") 

V u 

■I {f2{-\u),PY\X,v{-\-,v)) , 
f^"' \x\u) = p^^^jj^jjr.-l{x\u,u"^'^). 

Let p'^^y{u\v) and p*j^^ij{x\u) be the solution to 

, max .f3{pu\v{u\v),px\u{x\u)) 

Pu\vW-"),Px\u{^m 

V U 

■I {p*x\ui-\^)^PY\xy{-\-,v)^ . 

Note that p'^^y{u\v) and p*^i^jj{x\u) are not functions of 
because f^{-, ■) is not a function of u'^~^ . Furthermore, 
/a {Pu\v{u\v) ,px\u{x\u)) is a linear function of simplex 
{Pu\v(u\v),u G U}, and thus, the optimal p^^y[u\v) is ob- 
tained at the extreme point = 5[u — tf*{v)] for some 

{1 X — Q 
elsewhere 

Therefore, ( 1351 ) and ( 1361 ) are obtained. 
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2) With the Input Constraint: The traditional 
method reviewed in Section III-AI will not work 
here. One cannot produce a better feedback function 
and input distribution {pu\v{u\v),px\u{^W)) by 
averaging '\u\v),J^^" because 

/a {pu\v{u\v)^px\u{x\u)) is not a convex function of 
{pu\v{u\v),px\u{x\u)) ■ However, one could introduce a 
time sharing variable, as shown in Section III-BI but the time 
sharing variable cannot be absorbed into an existing auxiliary 
variable of the capacity formula as in ( l24l l. 

Therefore, we resort to the Lagrangian Converse Proof [2]. 
The key steps are 



I{W-Yi\Vi\Ui') 

N 

N 

-A*^(EKX„)]-po) 



(37) 



N 



< EE^K-^) 



n=l „n-l 



■fi (pan/("l«)'Px|c/(2;|M), A* 



(38) 
(39) 



where A* is the solution to ( |32] i; ( [37] i follows from 
the fact that the constraint is satisfied and thus 

ELi (E HXn)] ~ po) > 0; and 

h{fi{u\v),f2{x\u),X) 

V U 

■[l{f2{-\u),pYix,v{-\;v)) 



-A f2ix\u)a{x) 



Po 



Let p'^^y{u\v) and p*^^^{x\u) be the solution to 

fi {pu\v{u\v),px\u{x\u), X*] 



max 

Pu\v(uW),Px\u(x\u) ' 



Again, because /4(-,-,-) is not a function of and 
/4 (Pc/|v('«l")iPx|;7(2;|w), A*) is a hnear function of the sim- 
plex {pmv{u\v),u e U}, one obtains that = ^[-u — 

(p*{v)] and p^fitfl^l"") 11°'^ functions of u"""'^. Therefore, 
(l38T l and (l32]l are obtained. 

ij Relation of the Lagrange Dual Function to the Time 
Sharing and the Capacity Region: In the following, we 
illustrates the central role of the Lagrange dual function L3 
from two aspects. 

Time Sharing: We first discuss a time sharing expression 
Cj^ of the capacity and then show that Cj^ = C3 using 
the Lagrange dual function L3. The alternative converse proof 



using time sharing is as follows. Define the random variable 
Qi to be uniformly distributed over {!,..., iV} and another 
one to be Q2 = U^^^^. Then define the time sharing random 
variable Q ~ (Qi,Q2) G Q- We obtain 



I{W;Y^\Vr,Un 

N 

< EEp(-r^) 

.h[f[-^''\u\v)jt^^'\x\u)) 

q V u 

■I {Px\u,Q{-W,q),pY\xy{-\-,v)) 
= NI{X-Y\U,V,Q) 



(40) 
(41) 



where 



TS 



max 

Q:PQ,'PQi-),Px\u,Q-Ela{X)]<Po 

IiX;Y\U^ifQiV),V,Qy, 



(42) 



and dJTJ follows from the fact that ( |40] | is a linear function of 
the simplex {pu\v,q{'^\^j ^ U} and thus the deterministic 
feedback U = ipQ{V) does not lose the optimality. 

It turns out that the Lagrange dual function in (l33T l is 
not only the dual to the primary problem i?3 in (|30] |. but also 
the dual to the optimization of cj^ in (|42] |: 

iF(A,/3o) = ^ max y^Pqiq) ■ 

Q^pq^v>q{-),px\u.q ~r„ 

{IiX;Y\U^^{V),V,Q^q) 
-X{E[a{X)\Q = q]- Po)} (43) 
= L3, (44) 

where (l44l i follows the fact that the function to be optimized 
in ( l43T l is a linear function of the simplex {pQ{q),q e Q} and 
thus, the optimal solution is obtained at certain q* for which 
PQ{q*) = 1. Therefore, the one dual function for two primary 
problems shows that Cj^ — C3. 

Capacity Regions: We show that the Lagrange dual function 
L3 characterizes the boundary points of the two expressions, 
Cj^ and C3, of the single user capacity region. Equation (|40] | 
shows that any achievable rate r under constraint p must 
belong to the following capacity region: 



TS 



closure 

Q,PQ.PUIV,Q,PXIU,Q 

-.TS 



Cj,Fixed {Q^PQ,Pu\v,q,Px\u,q) , (45) 



where 



Cj,Fixed ( 2: PQ > Pal V,Q , I ;7,q) 

- {{r,p):0<r<I{X;Y\U,V,Q),E[a{X)]<pl46) 

Note that following the leads by Gallager in the study of non- 
convex multiple access capacity region [10], we have included 
p to make the capacity region a two dimensional set. Since a 
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convex hull performs the time sharing for you, an equivalent 
capacity region is 



Comparison of random feedback and time sharing 



C3 = closure convex 

Pu\v,Px\u 
Cs.Fixed {pu\V,PX\u) 
— '-3 ' 



(47) 



where 



C: 



3,Fixed {PU\ViPX\u) 

= {ir,p):0<r<I{X;Y\U,V),E[a{X)]<p}(48) 

Characterizing the boundary of Cj^ and C3 can be reduced 
to solving the Lagrange dual function L3. Let (1, — A) be the 
normal vector of a hyperplane. Finding the points of Cj^ that 
touch the hyperplane needs to solve 



which can be reduced to 



The same is true for C3: 



max (1,-A) • (r, p) 
max r — Xp, 

{r,p)(iCf 



I{X-Y\U,V,Q) 

E[a{X)] 

Lf{X,Po) + Xpo 
L3{X,po) + Xpo. 



P = 
S3 (A) = 



I{X;Y\U,V) 
E[a{X)] 

i3(A,/9o) + Xpo- 



Therefore, we have seen that the Lagrange dual function 
plays the central role to connect the boundary points of the 
capacity region and the capacity expressions: 



BiHx) 

= B3{X) 



Xpo 
Xpo 



= Lf{X,po) 

= L3{\,po) 

> mini3(A, po) 

- cM = cJ'iPo) 



Remark 1: Expressing the capacity as the minimum of the 
Lagrange dual function also helps to calculate the capacity 
because one does not need to worry about the time sharing 
while performing the optimization. If multiple solutions, i.e., 
input distributions etc., achieve the same value of the Lagrange 
dual function, then the capacity achieving strategy is a time 
sharing of these solutions and the time sharing coefficients are 
chosen to satisfy the constraint. See [1], [2] for details. 

Example 1: To illustrate the capacity with nonzero duality 
gap, we produced an example, whose detailed derivation is 
given in [1], [2]. The channel is an additive Gaussian noise 
channel with three states, good, moderate, and bad states, 
corresponding to small, moderate, and large noise variances. 
The feedback is limited to 1 bit/channel use. For small long 
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Fig. 4. An example of nonzero duality gap. 



term average power constraint, the optimal strategy is to turn 
on the transmitter with a fixed power only when the channel 
is in the good state, as shown by the dotted curve in Figure 
ID For large power constraint, the optimal strategy is to turn 
on the transmitter when the channel is in good or moderate 
state with another fixed power, as shown by the solid curve 
in Figure |4] For the power constraint in between, the optimal 
strategy is a time sharing of the above two strategies, as shown 
by the line segment terminated by the "o"s. The gap between 
the line segment and the maximum of the dotted and the solid 
curves is exactly the nonzero duality gap between C3 and R^. 
The slope of the line segment is A*. The "+" markers are for 
random feedback discussed in [1], [2]. 

in. The Extension to the Rate Distortion Theory 

A. The Converse Proof 

It is straight forward to extend the Lagrangian Converse 
Proof to the rate distortion theory. We illustrate it using the 
classic i.i.d. source as an example. The rate distortion function 
of quantizing i.i.d. source X to X in a vector manner is 

R[{D) = min I{X]X), 

p^^^-E[d(X,X)]<D 

where d{-,-) measures the distortion. Use the Lagrange dual 
function, we have another expression 



Ri{D) 



max Li (A, D), 

A>0 



(49) 



where 



Li{\,D) = minI{X;X) + X[E[d{X,X)]- D 

Pxix 



In general, the Lagrange dual function is a lower bound and 
we have Ri{D) < R[{D). Due to the convexity of the mutual 
information, we have Ri{D) — R[{D). 
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The last few steps of the conventional converse proof is [11] isomorphism. 



N 



n=l 
N 

> ^i?'i(E[rf(X„,X„)]) 

n=l 

> nR[(^^J2E[diXr,,X„)]^ 
= nR[iD), 



(50) 



where dSOl l used the property that R[{D) is a convex U 
function of D. 

The Lagrangian Converse Proof does not need to prove 
the the convexity property of R'i{D) before performing the 
converse proof: 



N 



I{Xn] Xn) 



n=l 
N 



> {l{Xn;Xn) + \* (E[d{Xn,Xn)]~D^) (51) 



n=l 



> nRi{D), 



(52) 



where A* > is the solution to (|49] l; ( BTl i follows from the 
fact that the distortion requirement is satisfied by X„'s and 
thus A* ((eLiEKX„,^„)]) -Nd'^ < 0; (|53 follows 
from the fact that i?i (D) lower bound the summand in (ISTT i 
for every n. 

The benefit of the Lagrangian Converse Proof may not 
appear to be significant in this simple example. But it can be 
easily applied to more complex cases when the time sharing 
has to be used in R[{D). Another example is when there 
are other constraints in addition to the distortion, in which 
case, simply introducing more Lagrange multipliers solves the 
problem. 



B. Dual Relation between Channel Capacity and Rate Distor- 
tion 

We note that using expressions involving Lagrange dual 
functions, the channel capacity and the rate distortion function 
has a pleasant symmetric form, as evident in Ci(po) ® 
and Ri{D) ( |49] l for channels without side information. The 
symmetric form shows a dual relation in the sense of [5]. 

It can be easily extended to the case of non-causal side 
information considered in [5], where the constraints of the 
capacity is not considered. With the constraint, the capacity 
(|53] l and the rate distortion (|54] i are shown at the top of the 
next page. The dual relation defined in [5] is the following 



Channel Capacity 
min 

max 

-A 

Transmitted Symbol X 
Received Symbol Y 
State to Encoder Si 
State to Decoder 52 
Auxiliary U 
Input Cost a(-) 
Input Constraint po 



Rate Distortion 

max 
min 
+A 

X Estimation 
X Source 

^2 State to Decoder 
Si State to Encoder 
U Auxiliary 

d{-,-) Distortion Measure 
D Distortion. 



A stronger dual relation is defined in [12], where the 
capacity and the rate distortion can be made equal by selecting 
proper constraints. But it does not work when the optimal 
solutions need time sharing. Since ( l53T l and (|54] | do not 
include the time sharing variables, it is a future research to 
see whether the stronger dual relation can be established with 
some modification. 

The dual relation for the limited feedback case is not dis- 
cussed here. The reason is that the not-strictly-causal feedback 
to the transmitter in channel capacity corresponds to finite rate 
state information to the decoder in rate distortion. While the 
encoder in channel capacity cannot use future feedback, the 
decoder in rate distortion can wait to use both past and future 
finite rate state information. 

IV. Conclusions 

We have introduced a simple converse proof that uses the 
Lagrange dual function to upper bound the information rate. 
It provides the following approach to deal with constraints: 
1) Based on the capacity of the channel without constraints, 
express the capacity for the case with the constraints as the 
minimum of the Lagrange dual function; 2) Simply modify 
the converse proof for the case without the constraints by 
adding to the second to the last expression a term involving 
the Lagrange multiplier and the constraints, to produce the 
converse proof for the case with the constraints; 3) For the 
achievability, study the duality gap to determine whether the 
time sharing is needed. 

We show that the unified capacity expression, 

C — min Lagrange Dual Function(A), 

plays a central role to connect the characterization of the 
single user capacity region, the time sharing capacity formula, 
and the formula resulted by imposing the constraint to the 
maximization in the capacity formula of the case without 
constraints. The Lagrangian capacity formula works regardless 
whether the problem is convex or not. This formula also 
simplifies the evaluation of the capacity, by deferring the 
consideration of the time sharing. 

The above is extended to the rate distortion theory. A sym- 
metric form of capacity and rate distortion function is shown to 
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mm max 

X>OU,X=y,{U,Sl),pu^s^ 



I{U; 52, Y) - I{U; Si) - X{E[a{X)] - po) 



(53) 



R2{D) 



max min 



^>0 U,X=f{U,S2),Puix.Si 



I{U; Si,X)- I{U; S2) + X{E[d{X, X)] - D), 



(54) 



demonstrate the dual relation between them. Further extension 
to the case of multiple constraints is straight forward. We 
have discussed the single letter capacity formula in this paper. 
The extension of the Lagrangian Converse Proof to multi- 
letter capacity formula, multiaccess channels, and broadcast 
channels is deferred to future research. 
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